{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NLP Course 2 Week 1 Lesson : Building The Model - Lecture Exercise 02\n",
    "Estimated Time: 20 minutes\n",
    "<br>\n",
    "# Candidates from String Edits\n",
    "Create a list of candidate strings by applying an edit operation\n",
    "<br>\n",
    "### Imports and Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# data\n",
    "word = 'dearz' # 🦌"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Splits\n",
    "Find all the ways you can split a word into 2 parts !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# splits with a loop\n",
    "splits_a = []\n",
    "for i in range(len(word)+1):\n",
    "    splits_a.append([word[:i],word[i:]])\n",
    "\n",
    "for i in splits_a:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# same splits, done using a list comprehension\n",
    "splits_b = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n",
    "\n",
    "for i in splits_b:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete Edit\n",
    "Delete a letter from each string in the `splits` list.\n",
    "<br>\n",
    "What this does is effectivly delete each possible letter from the original word being edited. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# deletes with a loop\n",
    "splits = splits_a\n",
    "deletes = []\n",
    "\n",
    "print('word : ', word)\n",
    "for L,R in splits:\n",
    "    if R:\n",
    "        print(L + R[1:], ' <-- delete ', R[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's worth taking a closer look at how this is excecuting a 'delete'.\n",
    "<br>\n",
    "Taking the first item from the `splits` list :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# breaking it down\n",
    "print('word : ', word)\n",
    "one_split = splits[0]\n",
    "print('first item from the splits list : ', one_split)\n",
    "L = one_split[0]\n",
    "R = one_split[1]\n",
    "print('L : ', L)\n",
    "print('R : ', R)\n",
    "print('*** now implicit delete by excluding the leading letter ***')\n",
    "print('L + R[1:] : ',L + R[1:], ' <-- delete ', R[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the end result transforms **'dearz'** to **'earz'** by deleting the first character.\n",
    "<br>\n",
    "And you use a **loop** (code block above) or a **list comprehension** (code block below) to do\n",
    "<br>\n",
    "this for the entire `splits` list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# deletes with a list comprehension\n",
    "splits = splits_a\n",
    "deletes = [L + R[1:] for L, R in splits if R]\n",
    "\n",
    "print(deletes)\n",
    "print('*** which is the same as ***')\n",
    "for i in deletes:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ungraded Exercise\n",
    "You now have a list of ***candidate strings*** created after performing a **delete** edit.\n",
    "<br>\n",
    "Next step will be to filter this list for ***candidate words*** found in a vocabulary.\n",
    "<br>\n",
    "Given the example vocab below, can you think of a way to create a list of candidate words ? \n",
    "<br>\n",
    "Remember, you already have a list of candidate strings, some of which are certainly not actual words you might find in your vocabulary !\n",
    "<br>\n",
    "<br>\n",
    "So from the above list **earz, darz, derz, deaz, dear**. \n",
    "<br>\n",
    "You're really only interested in **dear**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vocab = ['dean','deer','dear','fries','and','coke']\n",
    "edits = list(deletes)\n",
    "\n",
    "print('vocab : ', vocab)\n",
    "print('edits : ', edits)\n",
    "\n",
    "candidates=[]\n",
    "\n",
    "### START CODE HERE ###\n",
    "#candidates = ??  # hint: 'set.intersection'\n",
    "### END CODE HERE ###\n",
    "\n",
    "print('candidate words : ', candidates)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Expected Outcome:\n",
    "\n",
    "vocab :  ['dean', 'deer', 'dear', 'fries', 'and', 'coke']\n",
    "<br>\n",
    "edits :  ['earz', 'darz', 'derz', 'deaz', 'dear']\n",
    "<br>\n",
    "candidate words :  {'dear'}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary\n",
    "You've unpacked an integral part of the assignment by breaking down **splits** and **edits**, specifically looking at **deletes** here.\n",
    "<br>\n",
    "Implementation of the other edit types (insert, replace, switch) follow a similar methodology and should now feel somewhat familiar when you see them.\n",
    "<br>\n",
    "This bit of the code isn't as intuitive as other sections, so well done!\n",
    "<br>\n",
    "You should now feel confident facing some of the more technical parts of the assignment at the end of the week."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
